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PREDICTING  INVOLUNTARY  SEPARATION  OF  ENLISTED  PERSONNEL 


I.  INTRODUCTION 

In  the  spring  of  1977.  Request  for  Personnel  Research  (RPR)  774,  Development  of  Enlistment 
Standards  for  the  Armed  Forces,  was  sent  from  the  Air  Force  Military  Personnel  Center  (AFMPC)  (now 
known  as  the  Air  Force  Manpower  and  Personnel  Center]  to  the  Air  Force  Human  Resources  Laboratory 
(AFHRL).  The  basic  objectives  of  RPR  774  included  the  following:  (a)  to  develop  a  substitute  for  the 
current  Air  Force  enlistment  standard,  (b)  to  evaluate  the  Military  Service  Inventory  (MSI)  as  a  predictor  of 
attrition,  and  (c)  to  test  the  relative  efficiency  of  the  Motivational  Attrition  Prediction  (MAP)  method  in  a 
binary  classification  problem,  such  as  the  prediction  of  retention  versus  attrition.  Evaluation  of  the  request 
by  the  various  divisions  of  AFHRL  eventually  led  to  the  decision  to  cancel  RPR  774  and  to  establish  two 
new  requests,  RPR  77-13,  Development  of  Alternative  Air  Force  Enlistment  Standards,  and  RPR  77-14, 
Development  of  Improved  Methods  for  Predicting  Involuntary  Separation.  The  purpose  of  establishing  two 
new  requests  in  place  of  the  original  was  to  facilitate  the  appropriate  separation  of  research  responsibility 
within  AFHRL.  i.e.,  the  first  new  request  dealt  with  objectives  (a)  and  (b)  of  RPR  774  listed  above,  and 
the  second  dealt  with  objective  (c).  The  Personnel  Research  Division  (AFHRL/PE)  was  tasked  with  RPR 
77-13  and  the  Computational  Sciences  Division  (AFHRL/SM)  with  RPR  77-14. 

This  report  describes  the  research  carried  out  by  AFHRL/SM  in  support  of  RPR  77-14.  The  basic 
problem  concerns  predicting  involuntary  separation  (attrition)  within  the  Air  Force  enlisted  force.  Specific 
objectives  of  this  study  include  the  following:  (a)  to  implement  the  MAP  computer  program  on  the  AFHRL 
UNIVAC  1 108  computer  system,  (b)  to  compare  the  predictive  efficiency  of  the  MAP  method  with  that  of 
the  AFHRL  multiple  linear  regression  technique  (referred  to  as  TR1COR).  (c)  to  compare  MAP  and 
TRICOR  with  other  predictive  methodologies  capable  of  handling  binary  criterion  situations,  and  (d)  to 
evaluate  the  various  predictive  methodologies  using  other  binary  criteria  such  as  graduation/elimination 
front  Technical  Training  (TT).  Basic  Military  Training(BMT),  and  Undergraduate  Pilot  Training (UPT).  “Dais 
last  objective  (d)  is  not  addressed  here  but  will  be  the  subject  of  a  subsequent  report.  The  results  included 
here  are  restricted  to  predicting  involuntary  separation. 

The  next  section  describes  the  statistical  methodologies  compared.  Three  predictive  methodologies 
associated  with  regression  theory  were  considered  for  use  in  this  study.  These  methodologies  will  be 
referred  to  as  ordinary  least  squares  (OLS),  standardized  least  squares  (SLS).  and  weighted  least  squares 
(WLS).  OLS  was  the  methodology  employed  in  the  analyses  described  in  Section  V  and.  hence,  is 
discussed  in  the  following  section.  SLS  has  been  compared  to  MAP  with  regard  to  classification  accuracy  in 
several  problem  settings  (Beatty.  1977).  Basically,  the  use  of  standardized  least  squares  allows  the  creation 
of  a  predictive  model  that  is  independent  of  the  units  of  measurement  since  the  independent  variables  have 
been  normalized  to  zero  mean  and  unit  variance.  This  methodology  was  tested  in  the  present  study  and.  as 
expected,  yielded  classification  accuracy  results  equivalent  to  those  for  OLS.  An  in-depth  examination  of 
the  predictive  efficiency  of  SLS  will  be  conducted  in  the  follow-up  efforts  referred  to  in  objective  (d)  and 
discussed  briefly  in  the  last  section  of  this  report.  A  consideration  in  applying  OLS  to  a  predictive  problem 
involving  a  binary  criterion  is  that  the  error  variances  are  unequal.  Although  the  application  of  OLS  results 
in  unbiased  estimates  of  the  regression  coefficients,  the  estimates  are  inefficient  since  they  will  not  have  the 
minimum  variance  property  among  the  class  of  unbiased  estimators.  Performance  of  the  WLS  computations 
(Draper  &  Smith,  1966)  results  in  constant  error  variances  allowing  a  possible  decrease  in  the  variance 
associated  with  each  estimated  regression  coefficient.  Although  WLS  offers  a  potential  improvement  to 
OLS.  its  capability  to  accurately  classify  individuals  as  successes/failures  was  not  examined  in  detail  since 
(a)  a  study  using  a  quickly  assembled  WLS  computer  programming  package  produced  classification 
accuracy  results  similar  to  those  for  OLS,  (b)  some  WLS  analyses  yielded  nonsensical  results,  and  (c) 
implementation  of  an  efficient  WLS  computer  programming  package  to  perform  analyses  similar  to  those 
tor  OLS  would  not  have  allowed  timely  completion  of  the  milestones  associated  with  this  research  effort. 
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Following  the  discussion  of  statistical  methodologies  employed  are  sections  on  the  airman  population, 
description  of  predictive  variables,  selection  of  subsamples,  model  formulation  and  analysis,  and 
comparison  of  computer  resources  required.  Numerous  tables  arc  displayed  for  comparative  purposes,  and 
results  and  recommendations  are  discussed  last. 


II.  DESCRIPTION  OF  STATISTICAL  METHODOLOGIES 


The  statistical  methodologies  examined  in  this  study  for  their  ability  to  correctly  classify  individuals 
as  successes/failures  are  the  following:  TRICOR,  a  computer  programming  package  containing  a  stepwise 
regression  algorithm;  MAP,  a  computerized  algorithm  based  on  maximum  likelihood  estimation  and  utility 
theory  ,  and  BAYS,  a  computerized  algorithm  utilizing  Bayes’  formula.  The  stepwise  regression  theory  of 
TRICOR  is  described  in  numerous  publications  (Dixon,  1968;  Draper  &  Smith,  1966;  Efroymson,  1960; 
Goldberger,  1961;  Goldberger  &  Jochems,  1961;  Pope  &  Webster,  1972),  and  the  maximum  likelihood 
estimation  and  utility  theory  of  MAP  is  documented  in  AFMPC  publications  (Dempsey  &  Fast,  1976; 
Dempsey,  Fast,  &  Sellman,  1977).  A  brief  description  of  the  important  characteristics  of  BAYS  will  be 
presented  here,  and  a  more  detailed  description  is  available  in  the  computer-based  SMSM  program 
documentation  library  at  AFHRL. 


Although  reader  familiarity  with  stepwise  regression  theory  and  MAP  maximum  likelihood  estimation 
and  utility  theory  is  assumed,  a  comparison  of  the  limitations  of  the  computerized  implementations  of  the 
two  methodologies  as  they  exist  on  the  AFHRL  UN1VAC  1108  computer  system  is  important  to 
researchers  who  want  to  use  either  of  the  programs.  When  interfaced  with  a  compatible  hit  table 
subroutine,  TRICOR  has  the  capability  to  accept  a  data  file  containing  information  on  up  to  399  predictor 
variables  and  9,999  cases  per  subsample.  In  contrast,  the  current  version  of  MAP  can  accept  a  data  file 
containing  information  on  20  predictor  variables  and  the  maximum  number  of  cases  allowable  can  be 
estimated  by  the  following  formula: 


NCASFS  = 


160,000 
NVARS+  3 


where  NCASFS  represents  the  number  of  cases  and  NVARS  represents  the  number  of  independent 
variables.  For  example,  MAP  problems  utilizing  5,  7,  13,  or  17  independent  variables  allow  processing  of 
data  files  containing  approximately  2  X  104  ,  1 .6  X  104 , 104  ,or  8  X  103  cases,  respectively.  An  important 
consideration  for  a  potential  MAP  user  is  that  the  program  utilizes  an  iterative  technique  (Brown.  1967)  to 
solve  a  system  of  simultaneous  nonlinear  equations.  As  will  be  observed  in  subsequent  analyses,  the 
computerized  algorithm  does  not  always  converge,  denying  the  researcher  a  direct  comparison  of  the 
predictive  accuracies  of  MAP  versus  TRICOR  or  BAYS. 

BAYS,  a  computer  program  whose  development  was  based  on  the  Attribute  Bayesian  Classification 
Decision  (ABCD)  technique  (Moonan,  1972).  utilizes  Bayes’  formula  to  compute  probabilities  of  class 
membership  for  each  case,  with  the  result  that  an  individual  is  assigned  to  the  criterion  category  which  has 
the  highest  a  posteriori  probability.  An  important  improvement  to  the  ABCD  technique  was  the 
implementation  of  a  stepwise  procedure  in  the  model-building  algorithm  whereby  variables  can  be 
eliminated  after  they  have  been  added  to  the  predictive  scheme.  Hit  tables,  which  indicate  the  number  of 
cases  correctly  classified,  are  used  to  select  the  predictor  variables  that  most  effectively  discriminate  among 
the  criterion  categories.  At  each  stage  of  the  model -building  procedure,  the  predictive  composite  is  formed 
which  corresponds  to  the  highest  classification  accuracy  resulting  from  all  possible  additions  (or  deletions) 
of  one  variable  to  (or  from)  the  predictive  composite  existing  at  the  previous  stage.  As  described  in  Section 
V,  several  random  samples  of  the  population  were  constructed  specifically  to  estimate  empirical 
probabilities. 

Aside  from  run  time  constraints  which  will  be  discussed  later,  BAYS  has  the  capability  to  accept  a 
data  file  containing  information  on  200  independent  variables  having  63  categories  each; however,  the  total 
number  of  categories  for  all  independent  variables  must  not  exceed  2000.  Since  the  application  of  the 
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BAYS  algorithm  is  restricted  to  analysis  of  categorical  independent  and  dependent  variables,  a 
categorization  was  performed  on  each  independent  variable  with  the  idea  of  minimizing  the  amount  of 
information  lost  in  the  process.  This  categorization  requirement  precluded  a  BAYS  analysis  of  models 
containing  interactive  terms.  At  present,  BAYS  does  not  have  the  capability  to  classify  individuals  in  an 
operational  setting  since  it  does  not  retain  information  on  the  disposition  of  each  case.  Information  is 
retained  only  on  the  disposition  of  a  group  of  cases  in  the  form  of  a  hit  table.  The  performance  of  a 
proposed  work  effort  would  rectify  this  deficiency.  In  addition,  the  work  effort  proposes  that  BAYS  be 
modified  to  utilize  a  variable  packing  factor  for  storing  cases  on  a  record,  dynamic  storage  allocation,  and 
computational  shortcuts  to  decrease  the  number  of  data  file  passes. 


m.  FIRST -TERM  AIRMAN  POPULATION 

The  population  for  this  study,  which  consisted  of  1 1 .231  airmen  who  entered  the  Air  Force  between 
April  and  July  1072,  was  selected  for  two  major  reasons:  (a)  the  data  were  immediately  available  since  the 
population  comprised  a  data  file  prepared  to  support  RPR  77-13,  and  (b)  the  population  was 
characteristically  similar  to  the  one  examined  by  Dempsey  et  al.  (1977)  which  consisted  of  airmen  who 
entered  the  Air  Force  between  June  and  August  1972.  In  order  that  each  case  could  be  classified  into  a 
criterion  category  in  a  meaningful  way.  separation  designation  numbers  (SDNs)  were  grouped  and  recoded 
in  the  following  manner:  SDNs  reflecting  normal  separations  or  active  duty  status  were  recoded  to  a  value 
of  one  and  SDNs  reflecting  undesirable  losses  such  as  marginal  productivity/inaptitude,  unfitness,  or 
unsuitability  were  recoded  to  a  value  of  zero.  This  definition  of  the  criterion  categories  was  coordinated 
with  AFMPC.  As  a  result,  of  the  1 1,231  airmen  in  the  population.  7,694  were  recoded  to  "one"  with  the 
remaining  cases  recoded  to  “zero"  (i.e.,  68.5%  of  the  cases  were  coded  as  successes,  and  31 .5%  were  coded 
as  failures). 


IV.  DESCRIPTION  OF  PREDICTOR  VARIABLES 

As  previously  mentioned  in  Section  111,  predictor  variable  information  was  available  from  a  data  file 
prepared  to  support  RPR  77-13.  Information  used  in  the  creation  of  that  file  originated  from  a  data  file 
created  for  a  previous  work  effort  and  the  Processing  and  Classification  of  F.nlistees  (PACE)  file  at  AFHRL. 
Complete  information  on  the  following  variables  was  available  for  all  airmen  in  the  population: 

1.  Scores  from  the  aptitude  tests  (Administrative,  Mechanical.  Electrical,  and  General)  of  the  Armed 
Services  Vocational  Aptitude  Battery  (ASVAB). 

2.  Scores  from  the  Armed  Forces  Qualification  Test  (AFQT). 

3.  Prediction  of  drug  use  admission  ( PDA )  score  ( LaChar.  Sparks.  Larsen,  and  Bisbee.  1974). 

4.  Military  Service  Inventory  (MSI)  score  (Dempsey  et  al..  1977). 

5.  Education  Number  of  years  required  to  reach  highest  level  of  education. 

f).  Dependents  Coded  as  0  ( I )  denoting  number  of  dependents  at  enlistment  less  than  or  equal  to 
2  (greater  than  2).  i.e..  this  variable  was  assigned  a  value  of  0  if  the  number  of  dependents  at  enlistment  was 
less  than  three,  and  assigned  a  value  of  I  if  the  number  of  dependents  at  enlistment  was  greater  than  two. 

7.  High  school  courses  The  following  courses  were  coded  as  I  (0)  denoting  completion 
(noncompletion): 

a.  Algebra 

b  Biology 

c.  Chemistry 

d.  Art 
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e.  Geometry 

f.  Photography 

g.  Physics 

h.  Trigonometry 


i.  English 

j.  Home  Economics 

8.  Age  -  Age  in  years  at  enlistment. 

Tables  A1  through  A13  in  Appendix  A  present  distributions,  means,  standard  deviations,  and 
intercorrelations  of  the  predictor  variables  for  the  1 1,231  case  population.  Many  of  the  aforementioned 
variables  were  recoded  (transformed)  during  the  analysis  phase  of  the  study;  however,  a  description  of  each 
transformation  will  be  deferred  until  the  next  section. 


V.  DATA  ANALYSIS 


Creation  and  Characteristics  of  Subsamples 

Three  random  samples  of  1 .500,  3,000,  and  6,000  cases  each  were  drawn  from  the  population  with 
the  requirement  that  the  three  samples  of  each  particular  size  contain  10%,  35%,  and  50%  involuntary 
dischargees.  Each  case  could  appear  only  once  in  each  sample  but  could  appear  in  more  than  one  sample. 
Each  of  the  nine  samples  was  randomly  separated  into  three  subsamples.  A  schematic  representation  of  the 
subsample  layout  is  shown  in  Figure  1.  Hereafter,  the  term  “base  rate”  referred  to  in  the  figure  is  defined  as 
the  percentage  of  correct  classifications  that  would  result  if  all  individuals  in  the  snbsample  were  classified 
into  the  criterion  category  representing  normal  separations  or  active  duty  status.  Subsamples  3N  +  1  and  3N 

+  2.  N  =  0,  1 ,  2 . 8  were  used  as  validation  and  cross-validation  subsamples,  respectively,  in  the  analysis 

of  each  subsatnple  size-base  rate  combination.  The  empirical  probabilities  for  the  BAYS  computations  were 

derived  from  subsamples  3N  +  3,  N  =  0,  1 ,  2 . 8.  Although  a  wide  range  of  base  rates  was  studied  in 

order  that  the  subject  methodologies  could  be  compared  in  a  variety  of  problem  settings,  attention  was 
primarily  focused  on  the  657<  subsample  base  rate  which  closely  approximates  the  68.5%  population  base 


Sample  # 

Sample  size 

Subumple  # 

Subsample 

Size 

P 

Q 

1 

500 

90 

10 

i 

1,500 

2 

500 

90 

10 

3 

500 

90 

10 

4 

500 

65 

35 

2 

1 ,500 

5 

500 

65 

35 

6 

500 

65 

35 

7 

500 

50 

50 

3 

1,500 

8 

500 

50 

50 

9 

500 

50 

so 

10 

1 .000 

90 

10 

4 

3,000 

11 

1,000 

90 

10 

12 

1,000 

90 

10 

13 

1,000 

65 

35 

5 

3,000 

14 

1,000 

65 

35 

15 

1.000 

65 

35 

16 

1 ,000 

50 

50 

6 

3,000 

17 

1,000 

50 

50 

18 

1,000 

50 

50 

19 

2,000 

90 

10 

7 

6,000 

20 

2.000 

90 

10 

21 

2.000 

90 

10 

22 

2,000 

65 

35 

8 

6,000 

23 

2.000 

65 

35 

24 

2,000 

65 

35 

25 

2,000 

50 

50 

9 

6,000 

26 

2,000 

50 

50 

27 

2,000 

50 

50 

P  -  base  rate 

Figure  1 .  Subsample  layout. 


Model  Formulation  and  Analysis 

The  methodological  comparisons  began  with  the  set  of  independent  variables,  called  Variable  Set  I. 
which  comprised  the  predictive  model  developed  by  Dempsey  et  al.  (1977).  Four  additional  sets  of 
independent  variables,  denoted  Variable  Sets  II  V.  were  examined  and  arc  listed  in  Table  1 .  Factors 
influencing  the  selection  of  Variable  Sets  II  V  were  the  following:  (a)  results  of  analyses  on  Variable  Set  I. 
(b)  a  regression  of  the  criterion  on  a  large  number  of  independent  variables,  (c)  large  increases  in 
“turnaround"  time  as  the  number  of  independent  variables  increases  associated  with  the  BAYS 
computations,  (d)  limitations  on  the  number  of  predictor  variables  compatible  with  a  MAP  analysis,  and  (e) 
coordination  with  the  AFHRL  focal  point  on  RPR  77-13  concerning  results  of  analyses  supporting  that 
research  effort. 
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Table  l .  Sets  of  Independent  Variables 


, 

II 

III 

IV 

V 

Admin  +  Elec"1 

Mechanical 

Mechanical 

Administrative 

Admin  ist  rative 

AFQTb 

MSIb 

Electrical 

Electrical 

Mechanical 

Mechanical 

General 

PDA 

Electrical 

Electrical 

EDUCC 

MSI 

EDUCC 

General 

General 

Dependents 

Education 

Art 

MS! 

MSI 

Aged 

Art 

Geometry 

Photography 

English 

Geometry 

Photography 

English 

Home  Economics 

Education 

Algebra 

Biology 

Chemistry 

Geometry 

Physics 

Trigonometry 

English 

Age 

EDUO 

Algebra 

Biology 

Chemistry 

Geometry 

Physics 

Trigonometry 

English 

Age1’ 

uSum  (normalized)  of  the  scores  from  the  administrative  and  electrical  tests  of  the  ASVAB. 

^Normalized  score. 

cCoded  as  0  ( 1 1  denoting  number  of  years  required  to  reach  highest  level  of  education  less  than  I  2  (greater  than  or 
equal  to  12). 

‘'Coded  as  0  ( I )  denoting  age  ( in  years)  at  time  of  enlistment  less  than  1 9  (greater  than  or  equal  to  19). 

‘Coded  as  0(1)  denoting  age  (in  years)  at  time  of  enlistment  equal  to  17  (greater  than  17). 

Tables  A14  through  A27  in  Appendix  A.  which  will  hereafter  be  referred  fo  as  "hit  tables."  present 
results  of  the  MAP.  TRICOR.  and  BAYS  methodologies  applied  to  a  validation  and  cross-validation 
subsample  for  each  subsample  si/e/base  rate/variable  set  combination.  An  examination  of  the  first  set  of 
results  in  Tabic  A14  provides  the  following  information.  For  the  500  ease  validation  suhsample  from  a  MAP 
problem  involving  a  505!  base  rate.  1 57  individuals  who  were  successes  (i.e.,  assigned  a  criterion  value  of  1 1 
were  classified  as  successes  and  1 73  individuals  who  were  failures  (i.e..  assigned  a  criterion  value  of  0)  were 
classified  as  failures.  In  addition.  77  individuals  who  were  failures  were  classified  as  successes,  and  03 
individuals  who  were  successes  were  classified  as  failures.  Therefore,  for  this  particular  validation 
subsample.  330  (or  157  +  173)  individuals  were  correctly  classified  and  I  70  (or  77  +  03)  individuals  were 
incorrectly  classified.  Tire  classification  accuracy  for  the  validation  subsample  was  66.0'!  and  for  the 
cross-validation  subsample  was  66.4',!.  The  remaining  hit  tables  comprising  Tables  A14  through  A27  can  be 
similarly  interpreted. 

As  can  he  observed  from  these  tables,  there  is  little  difference  among  the  methodologies  in  their 
ability  to  correctly  classify  tire  sampled  eases  into  tile  two  criterion  categories.  For  example,  the 
classification  accuracies  from  applying  MAP  and  TRICOR  to  the  validation  and  cross-validation  suhsamples 
using  Variable  Set  I  differed  by  no  more  than  2'.!  for  the  IS  subsample  size/base  rate  combinations,  with 
neither  methodology  exhibiting  clear  superiority.  In  fact.  15  of  the  IS  differences  were  less  than  !'!.  For 
the  nine  validation  subsamples,  the  classification  accuracies  for  MAP  were  greater  than  those  for  TRICOR 
for  live  problems  and  equal  lor  two  problems,  and  for  the  nine  cross-validation  subsamples,  the 
classification  accuracies  for  MAP  were  greater  than  those  for  TRICOR  for  four  problems  and  equal  for 
two  problems.  As  shown  m  Tables  AI4  and  A16.  the  classification  accuracies  from  applying  MAP  and 
BAYS  to  Variable  Set  I  ditto  red  by  no  more  than  3'!  for  all  suhsample  size/base  rate  combinations  with  a 
majority  of  the  dtflerences  being  less  than  \'h  lor  the  nine  validation  subsamples,  the  classification 
accuracies  lor  BAYS  weie  gu-aiei  than  those  lor  MAP  for  five  problems  and  equal  for  two  problems. and 
lor  the  nine  cross-validation  subsjmples.  the  classification  accuracies  for  BAYS  were  greater  than  those  for 
MAP  tor  th  ree  problems  and  less  than  those  for  MAP  for  six  problems.  A  similar  comparison  for  BAYSand 
TRICOR  can  he  derived  Innn  lables  \I4  .uid  AI5.  As  before,  a  majority  ol  the  differences  were  less  than 
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1 9?.  For  the  18  subsample  size/base  rate  combinations,  the  classification  accuracies  for  BAYS  were  greater 
than  those  for  TRICOR  for  eight  problems  and  equal  for  two  problems.  A  comparison  of  classification 
accural  among  the  three  methodologies  across  all  variable  sets  provides  similar  results.  Regarding  the 
performance  of  the  algorithms  as  a  function  of  base  rate,  sample  size,  or  variable  set.  there  was  little 
difference  in  their  abilities  to  correctly  classify  individuals  as  successes/failurcs.  Tire  application  of  each 
methodology  increases  classification  accuracy  substantially  (i.e..  an  improvement  of  approximately  13''?  to 
239?)  over  the  base  rate  for  the  subsamples  containing  509?  involuntary  dischargees:  however,  the 
improvement  in  classification  accuracy  decreases  dramatically  (i.e..  an  improvement  of  at  most  1 19?)  for  the 
subsamples  containing  359?  involuntary  dischargees  and  becomes  nearly  non-existent  (i.e..  an  improvement 
of  at  most  29?)  for  the  subsamples  containing  109?  involuntary  dischargees.  As  previously  mentioned,  the 
MAP  algorithm  did  not  converge  for  all  problems  which  can  be  witnessed  by  the  omission  of  several  hit 
tables;  therefore,  all  comparisons  between  MAP  and  BAYS  or  TRICOR  will  refer,  of  course,  to  the 
problems  for  which  the  MAP  algorithm  converged.  It  should  be  noted  that  for  the  three  659?  base  rate 
problems  utilizing  Variable  Set  III,  the  TRICOR  classification  accuracy  was  better  than  the  MAP 
classification  accuracy  in  all  cases;  however,  when  contemplating  the  significance  of  this  result, 
consideration  should  be  given  to  the  large  number  of  comparisons  that  were  made  in  which  none  of  the 
methodologies  showed  clear  superiority. 

Using  the  AFHRL  automatic  interaction  detector  algorithm.  AID4  (Gott  &  Koplyay.  1977:  Koplyay. 
Gott.  &  Elton.  1973).  interactive  terms  were  identified  in  an  effort  to  gauge  the  improvement  of 
classification  accuracy  by  adding  these  variables  to  the  appropriate  set  of  predictors.  As  mentioned  earlier, 
the  BAYS  algorithm  precludes  analysis  of  models  containing  interactive  terms.  Since  the  classification 
accuracy  results  of  this  effort  were  so  similar  to  the  previous  results,  the  corresponding  hit  table  summaries 
were  not  reproduced  in  this  report.  When  interactive  terms  were  introduced  into  the  MAP  algorithm, 
significant  convergence  problems  were  encountered.  For  example,  when  Variable  Sets  II  and  III  were 
augmented  with  interactive  terms,  the  MAP  algorithm  did  not  converge  for  each  problem.  Some  success  in 
achieving  convergence  was  realized  bv  performing  MAP  analyses  on  a  subset  of  the  augmented  Variable  Set 
III.  however,  a  similar  attempt  to  achieve  convergence  was  performed  on  the  augmented  Variable  Set  II 
with  little  success  resulting.  Based  on  the  subsample  size/base  rate  combinations  for  which  the  MAP 
algorithm  converged  for  problems  with  and  without  interactions,  little  predictive  efficiency  was  gained  by 
allowing  interactive  terms  to  be  included  in  the  model.  In  fact,  the  largest  improvement  observed  in 
classification  accuracy  was  1.49?  with  most  of  the  differences  being  less  than  I'?.  Similar  results  were 
observed  for  TRICOR  since  the  largest  improvement  in  classification  accuracy  was  1.6'?  with  most  of  the 
differences  being  less  than  15?  .  Although  the  inclusion  of  interactive  terms  in  these  analyses  did  yield  some 
increases  in  classification  accuracy,  the  magnitude  of  the  increases  would  not  justify  development  of  a  more 
complicated  model. 

Comparison  of  Required  Computer  Resources 

Although  the  classification  accuracy  results  are  similar  for  TRICOR.  BAYS,  and  MAP.  there  are 
differences  in  the  computer  resources  required  to  perform  the  computations  for  each  methodology.  All  of 
the  comparisons  to  be  presented  refer  to  the  version  of  each  computer  program  presently  operational  on 
the  AFHRL  UNIVAC  1 108.  The  magnitude  of  the  differences  could  vary  depending  on  the  computer 
system  employed,  and  with  an  extensive  research  effort,  each  predictive  algorithm  could  probably  he 
streamlined  with  respect  to  input/output  (I/O)  time,  central  processing  unit  (CPC' )  time  or  mass  storage 
required;  however,  the  contents  of  this  section  should  serve  as  a  valuable  guide  for  researchers  who  wish  to 
estimate  the  computer  resources  required  to  perform  each  methodology  on  the  AFHRL.  UNIVAC  1 108  or  a 
similar  computer  system  without  drastically  modifying  the  computerized  algorithms.  If  one  of  these 
methodologies  is  to  be  used  repeatedly  as  an  operational  tool  to  solve  the  type  of  problem  investigated  in 
this  report,  an  effort  should  be  initialed  to  tailor  the  identified  algorithm  to  the  specific  requirements  of 
that  application. 

As  noted  earlier,  an  increase  in  the  number  of  independent  variables  associated  w  ith  a  BAYS  problem 
results  in  a  dramatic  increase  in  "turnaround"  time.  The  total  limes  required  for  BAYS  processing  of  6,  9. 


and  14  member  variable  sets  for  500  case  subsamptes  were  approximately  27,  42,  and  65  minutes, 
respectively,  with  over  897?  of  those  times  allocated  to  I/O  processing:  moreover,  an  increase  in  the  number 
of  cases  per  subsample  resulted  in  a  proportionate  increase  in  total  (and  I/O)  processing  time.  The  total 
times  required  for  MAP  processing  of  6,  9,  and  14  member  variable  sets  were  approximately  3%  to  4%.  47? 
to  5%.  and  7 %  to  107?',  respectively,  of  the  total  times  required  to  process  a  similar  BAYS  problem  with  the 
CPU  times  comprising  approximately  77%  to  92%  of  the  total  time.  A  direct  comparison  of  TRICOR 
processing  times  with  MAP  and  BAYS  was  not  available  since  tire  TRICOR  processing  involved 
computations  germane  to  a  follow-up  research  effort  but  not  required  for  the  results  herein:  therefore,  any 
estimates  of  TRICOR  processing  times  should  be  considered  overestimates.  Tire  total  times  required  for 
TRICOR  processing  of  6.  9,  and  14  member  variable  sets  were  approximately  13%  to  17%.  8%  to  13%,  and 
5%  to  7%,  respectively,  of  the  total  times  required  to  process  a  similar  BAYS  problem  with  the  CPU  times 
comprising  approximately  8%  to  15%  of  the  total  time.  In  addition,  the  I/O  times  comprise  approximately 
649?  to  65% ,  72%  to  74%  ,  and  76%  to  80%  of  the  total  times  for  the  500,  1000,  and  2000  case  subsamples, 
respectively. 

The  I/O  time  required  for  a  MAP  problem  is  small  in  relation  to  the  total  time  required  since  a  large 
amount  of  information  is  retained  in  mass  storage,  necessitating  very  little  file  handling;  however,  mass 
storage  limitations  restrict  the  size  of  problems  that  can  be  processed,  as  was  reflected  in  an  earlier 
discussion.  The  total  time  required  to  process  a  MAP  problem  surpasses  the  total  time  required  to  process  a 
similar  TRICOR  problem  for  Variable  Set  4  for  most  problems;  therefore,  it  appears  that  the  TRICOR 
algorithm  becomes  more  efficient  in  relation  to  the  MAP  algorithm  with  respect  to  total  time  required  as 
the  number  of  independent  variables  associated  with  the  problem  increases.  The  CPU  times  required  to 
process  a  BAYS  or  MAP  problem  are  comparable,  but  the  I/O  times  presently  required  to  process  BAYS 
problems  limit  the  use  of  this  methodology  to  the  solution  of  smaller  problems  than  could  be  processed  by 
the  TRICOR  or  MAP  algorithms.  Of  course,  for  problems  involving  a  large  number  of  cases  and  predictor 
variables,  the  TRICOR  algorithm  presently  provides  a  method  to  seek  an  acceptable  solution  within 
reasonable  time  and  mass  storage  constraints. 

VI.  SUMMARY  AND  RECOMMENDATIONS 

In  order  to  measure  the  abilities  of  the  MAP.  BAYS,  and  TRICOR  algorithms  to  correctly  classify 
individuals  as  normal  dischargees  (including  active  duty  status)  or  involuntary  dischargees,  a  population  of 
I  1 .231  airmen  was  selected  that  was  characteristically  similar  to  one  that  had  served  as  a  data  base  for  a 
MAP  analysis  documented  by  Dempsey  et  al.  ( 1977).  The  current  effort  is  the  first  phase  in  a  project  to 
examine  the  capabilities  of  each  methodology  to  correctly  classify  individuals  in  binary  criterion  situations 
such  as  graduation/elimination  from  various  TT  courses,  UPT  and  BMT. 

To  examine  the  classification  accuracies  of  the  statistical  methodologies  in  a  variety  of  problem 
settings,  subsarnples  were  constructed  so  that  all  possible  combinations  of  three  subsample  sizes  (500,  1000. 
and  2000  cases)  and  base  rates  (50%.  65%,  and  90%)  could  be  analyzed  for  each  set  of  predictor  variables. 
Several  subsets  of  the  following  variables  and/or  transformations  of  the  variables  were  selected  for 
development  of  predictive  models  by  each  methodology:  (a). scores  front  the  aptitude  tests  (Administrative, 
Mechanical,  hlectrical.  and  General)  of  the  ASVAB.  (b)  AFQT  score,  (c)  PDA  score,  (d)  MSI  score,  (c) 
number  of  years  required  to  reach  highest  level  of  education,  (f)  number  of  dependents  at  enlistment, (g) 
age  in  years  at  enlistment,  and  (h)  high  school  completion  of  algebra,  biology,  chemistry,  art.  geometry, 
photography,  physics,  trigonometry.  Inglish,  and  home  economics.  The  classification  accuracies  and 
computer  resource  requirements  associated  with  the  application  of  each  statistical  methodology  to  all 
subsamplc  size/base  rate/variable  set  combinations  were  compared,  resulting  in  several  general  conclusions. 
Overall,  there  was  very  little  difference  among  the  methodologies  in  their  ability  to  correctly  classify 
individuals  as  successes/failures.  Application  of  each  methodology  resulted  in  a  substantial  increase  in 
classification  accuracy  over  the  base  rate  for  the  subsarnples  containing  50%  involuntary  dischargees: 
however,  this  improvement  became  less  pronounced  for  the  subsamples  containing  35%  involuntary 
dischargees  and  decreased  even  further  for  the  subsarnples  containing  10%  involuntary  dischargees.  The 


inclusion  of  AID4  identified  interaction  terms  in  the  model-building  process  did  not  yield  a  large  enough 
increase  in  classification  accuracy  to  justify  the  development  of  a  more  complicated  model.  Convergence 
problems  were  encountered  during  the  MAP  analyses  especially  when  some  of  the  sets  of  predictive 
variables  were  augmented  with  interactive  terms:  therefore,  a  comparison  of  predictive  efficiencies  among 
all  methodologies  does  not  exist  for  every  subsample  size/base  rate/variable  set  combination. 

Although  the  classification  accuracy  results  were  similar,  there  were  differences  in  the  computer 
resources  required  to  process  the  data  for  each  methodology.  For  all  analyses,  the  total  time  required  to 
process  a  BAYS  problem  was  appreciably  longer  than  the  total  time  required  to  process  a  similar  MAP  or 
TRICOR  problem,  due  mainly  to  the  large  amount  of  I/O  time  associated  with  performing  the  BAYS 
computations.  If  some  proposed  changes  to  the  BAYS  algorithm  are  implemented,  the  I/O  time  required 
for  processing  a  BAYS  problem  possibly  could  be  reduced  by  one-half;  however,  even  with  this  reduction, 
the  total  times  associated  with  the  BAYS  problems  would  have  greatly  surpassed  the  times  for  similar  MAP 
or  TRICOR  problems.  Although  the  total  time  required  for  processing  each  MAP  problem  was  appreciably 
less  than  that  required  for  BAYS,  the  CPU  time  required  for  processing  a  MAP  problem  increases  rather 
rapidly  as  the  number  of  independent  variables  increases',  consequently,  it  is  especially  important  with 
MAP,  as  with  the  other  methodologies,  to  employ  an  efficient  variable  selection  technique.  Due  to  mass 
storage  limitations,  an  increase  in  the  number  of  independent  variables  associated  with  a  MAP  problem 
causes  a  corresponding  decrease  in  the  maximum  number  of  cases  allowable  for  analysis.  If  the  number  of 
cases  and  predictor  variables  associated  with  a  particular  problem  is  large,  the  superior  data-handling 
capabilities  of  the  TRICOR  regression  algorithm  assume  added  significance;  in  fact,  TRICOR  may  be  the 
only  feasible  method  of  the  three  to  obtain  a  solution. 

Currently.  AFHRL  is  conducting  two  follow-up  studies  to  this  effort.  The  first  of  these  examines  the 
capabilities  of  the  MAP,  TRICOR,  and  BAYS  computerized  methodologies  to  correctly  classify  individuals 
as  TT  graduates/failures  and  the  second  compares  the  abilities  of  each  methodology  to  correctly  identify 
BMT  graduates/failures.  A  major  difference  between  the  present  and  new  efforts  is  that  the  test  design  for 
the  TT(BMT)  study  requires  the  validation  subsamples  to  be  randomly  selected  from  personnel  who  entered 
TT(BMT)  in  1976  and  the  cross-validation  subsamples  to  be  randomly  selected  from  personnel  who  entered 
TTIBMT)  in  1977  rather  than  selecting  the  validation  and  cross-validation  subsamples  from  the  same 
population.  Also  the  predictive  efficiency  of  standardized  least  squares  will  be  measured  in  a  variety  of 
problem  settings.  Since  the  validation  and  cross-validation  subsamples  are  not  necessarily  homogeneous, 
standardized  least  squares  predictive  models  which  are  independent  of  the  units  of  measurement,  may  fare 
better  than  ordinary  least  squares  predictive  models.  The  BMT  and  TT  research  efforts  should  be  pursued 
since  they  more  closely  simulate  a  “real  world"  prediction  problem  in  that  data  from  one  time  period  are 
used  to  develop  a  model  for  prediction  into  the  next  time  period. 
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Table  AI.  Distribution  of  the  ASVAB  Administrative 
Aptitude  Test  Scores  for  the  First-Term 
Airman  Population 


FIrst-Ttrm  Airman  Falling 


Scora  intarval  _ In  Scora  Intarval 


(Percent  lie) 

Numbar 

Percent 

<30 

1,157 

10.3 

30-39 

775 

6.9 

40—49 

1,761 

15.7 

50-59 

2,092 

18.6 

60-69 

2,012 

17.9 

70-79 

1,375 

12.2 

80-89 

1,158 

10.3 

90-99 

901 

8.0 

Table  A2.  Distribution  of  the  ASVAB  Mechanical 


Aptitude  Test  Scores  for  the  First-Term 
Airman  Population 

First -T arm  Airman  Falling 

In  Scora  Intarval _ 

IPncmtlli) 

Number 

Percent 

<30 

793 

7.1 

30-39 

898 

8.0 

40-49 

1.161 

10.3 

50-59 

2,589 

23.1 

60-69 

2,027 

18.0 

70-79 

1.375 

12.2 

80-89 

1.250 

11.1 

90-99 

1,138 

10.1 

Table  A3.  Distribution  of  the  ASVAB  Electrical 


Aptitude  Test  Scores  for  the  First -Term 
Airman  Population 

Score  Interval 

Flrst'Tarm  Airman  Falling 
, In  Scora  Intarval 

(Percentile) 

Numbar 

Pa  reant 

<30 

538 

4.8 

30-39 

616 

5.5 

40-49 

1,728 

15.4 

50-59 

1 ,969 

17.5 

60-69 

2,059 

18.3 

70-79 

1,103 

9.8 

80-89 

1,820 

16.2 

90-99 

1398 

12.4 
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Table  A4.  Distribution  of  the  ASVAB  General 
Aptitude  Test  Scores  for  the  First-Term 
Airman  Population 


Firtt>T«rm  Airmen  Falling 


Scora  Interval 
(Parctntlla) 

in  Scor«  interval 

Number 

Parcant 

<50 

2,634 

23.5 

50-59 

1,979 

17.6 

60-69 

2,522 

22.5 

70-79 

1,521 

13.5 

80-89 

1,483 

13.2 

90-99 

1,092 

9.7 

Table  AS.  Distribution  of  the  AFQT  Scores 


for  the  First-Term  Airman  Population 

First-Term  Airmen  Falling 

Scar*  Interval 

In  Scora  Intarval 

(Parctntlla) 

Nutnbar 

Parcant 

<30 

262 

2.3 

30-39 

1.793 

16.0 

40-49 

1,544 

13.7 

50-59 

1,781 

15.9 

60-69 

1,599 

14.2 

70-79 

1.486 

13.2 

80-89 

1.791 

15.9 

90  -99 

975 

8.7 

Table  A6.  Distribution  of  the  PDA  Scores 
for  the  First-Term  Airman  Population 

Flr»t-Ttrm  Airman  Falling 
in  Scora  Intarval 

Scora  Intarval 

Number  Parcant 

0-2 

2,318 

20.6 

3-5 

3,498 

31.1 

6-8 

2,623 

23.4 

9-11 

1,459 

13.0 

12-14 

766 

6.8 

15-17 

344 

3.1 

>17 

223 

2.0 
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Table  AT.  Distribution  of  the  MSI  Scores 
for  the  First-Term  Airman  Population 


Score  Interval 

First-Term  Airmen  Falling 

In  Score  Interval 

Number 

Percent 

0-3 

3,321 

29.6 

4-7 

4,164 

37.1 

8-11 

2,394 

21.3 

12-15 

936 

8.3 

16-19 

318 

2.8 

>19 

98 

.9 

Table  A8.  Distribution  of  Education 


for  the  First-Term  Airman  Population 

Interval 

First-Term  Airmen  Falling 
in  Interval 

(Yean) 

Number 

Percent 

<12 

1,542 

13.7 

12 

8,862 

78.9 

13 

364 

3.2 

14 

250 

2.2 

>14 

213 

1.9 

Table  A9.  Distribution  of  Number  of  Dependents 
at  Enlistment  for  the  First-Term 

Airman  Population 

First -Term  Airmen  Falling 
in  Interval 

Interval 

Number 

Percent 

0  2 
3-5 

11,115 

1 16 

99.0 

1.0 

Table  A10.  Distribution  of  Completion/Noncompletion  of 
High  School  Courses  for  the  First-Term  Airman  Population 


Course 

Completion 

Noncompletfon 

Number 

Percent 

Number 

Percent 

Algebra 

8,262 

73.6 

2,969 

26.4 

Biology 

8,417 

74.9 

2.814 

25.1 

Chemistry 

3,511 

31.3 

7,720 

68.7 

Art 

1,567 

14.0 

9.664 

86.0 

Geometry 

5,597 

49.8 

5.634 

50.2 

Photography 

1,653 

14.7 

9,578 

85.3 

Physics 

2,045 

18.2 

9,186 

81.8 

Trigonometry 

2,172 

19.3 

9,059 

80.7 

English 

10,593 

94.3 

638 

5.7 

Home  Economics 

1,905 

17.0 

9326 

83.0 
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Table  All.  Distribution  of  Age  at  Enlistment 
for  the  First-Term  Airman  Population 


Interval 

(VMfl) 

Firtt-Term  Airmen  Filing 

In  Score  Interval 

Number 

Percent 

17 

1,432 

12.8 

18 

4,126 

36.7 

19 

2,990 

26.6 

20 

1,452 

12.9 

21 

609 

5.4 

22 

331 

2.9 

23 

137 

1.2 

>23 

154 

1.4 

Table  A 12.  Means  and  Standard  Deviations 
of  the  Predictive  Variables  for  the 
First-Term  Airman  Population 


Predictive  Variable 

Mean 

so 

Administrative 

56.71 

20.67 

Mechanical 

58.97 

20.31 

Electrical 

62.02 

20.08 

General 

62.03 

17.95 

AFQT 

60.82 

19.91 

PDA 

6.16 

4.30 

MSI 

6.29 

4.28 

Education 

1 1 .93 

.91 

Dependents 

.00 

.02 

Algebra 

.74 

.44 

Biology 

.75 

.43 

Chemistry 

.31 

.46 

Art 

.14 

35 

Geometry 

.50 

.50 

Photography 

.15 

.35 

Physics 

.18 

.39 

Trigonometry 

.19 

.40 

English 

.94 

.23 

Home  Economics 

.17 

.38 

Age 

18.84 

1.48 

19 
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Table  A14.  Hit  Tables  of  MAP  Applied  to  Variable  Set  I  for 
Each  Subsample  Size  -  Base  Rate  Combination 


Validation 

Cross  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  500 

Predicted  1 

157 

77 

163 

81 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

93 

66.0 

173 

87 

66.4 

169 

Subsample  Size  -  1000 

Predicted  1 

332 

131 

327 

155 

Base  Rate  -  50% 

Classification  Accuracy  (%■) 

Predicted  0 

168 

70.1 

369 

173 

67.2 

345 

Subsample  Size  -  2000 

Predicted  1 

650 

304 

642 

303 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

350 

67.3 

696 

358 

67.0 

697 

Subsample  Size  -  500 

Predicted  1 

299 

111 

114 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

26 

72.6 

64 

33 

70.6 

61 

Subsample  Size  -  1000 

Predicted  1 

561 

176 

536 

188 

Base  Rate  —  65% 

Classification  Accuracy  (%) 

Predicted  0 

89 

73.5 

174 

114 

69.8 

162 

Subsample  Size  -  2000 

Predicted  1 

1109 

372 

1090 

369 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

191 

71.8 

328 

210 

71.0 

331 

Subsample  Size  -  500 

Predicted  1 

447 

42 

443 

47 

Base  Rate  -'90% 

Classification  Accuracy  (%) 

Predicted  0 

3 

91.0 

8 

7 

89.2 

3 

Subsample  Size  1000 

Predicted  1 

894 

89 

898 

94 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

6 

90.5 

11 

2 

90.4 

6 

Subsample  Size  20Pr 

Predicted  1 

1794 

186 

1795 

189 

Base  Rate  90% 

Classification  Accuracy  (%) 

Predicted  0 

6 

90.4 

14 

5 

90.3 

11 

Table  A15.  Hit  Tables  of  TR1COR  Applied  to  Variable  Set  I  for 
Each  Subsample  Size  -  Base  Rate  Combination 


Validation 

Cross  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  -  500 

Predicted  1 

192 

102 

181 

104 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

58 

68.0 

148 

69 

65.4 

146 

Subsample  Size  -  1000 

Predicted  1 

326 

128 

315 

141 

Base  Rate  -  50% 
Classification  Accuracy  (%) 

Predicted  0 

174 

69.8 

372 

185 

67.4 

359 

Subsample  Size  -  2000 

Predicted  1 

576 

233 

575 

244 

Base  Rate  -  50% 
Classification  Accuracy  (%) 

Predicted  0 

424 

67.2 

767 

425 

66.6 

756 

Subsample  Size  -  500 

Predicted  1 

290 

101 

288 

102 

Base  Rate  -  65% 
Classification  Accuracy  (%) 

Predicted  0 

35 

72.8 

74 

37 

72.2 

73 

Subsample  Size  -  1000 

Predicted  1 

562 

177 

536 

183 

Base  Rate  -  65% 

Gassification  Accuracy  (%) 

Predicted  0 

88 

73.5 

173 

114 

70.3 

167 

Subsample  Size  -  2000 

Predicted  1 

1078 

347 

1053 

345 

Base  Rate  -  65% 

Gassification  Accuracy  (%) 

Predicted  0 

222 

71.6 

353 

247 

70.4 

355 

Subsample  Size  -  500 

Predicted  1 

448 

44 

444 

48 

Base  Rate  -  90% 

Gassification  Accuracy  (%) 

Predicted  0 

2 

90.8 

6 

6 

89.2 

2 

Subsample  Size  -  1000 

Predicted  1 

894 

91 

898 

94 

Base  Rate  -  90%' 

Classification  Accuracy  (%) 

Predicted  0 

6 

90.3 

9 

2 

90.4 

6 

Subsample  Size  -  2000 

Predicted  1 

1797 

190 

1796 

191 

Base  Rate  -  90% 
Gassificrtion  Accuracy  (%) 

Predicted  0 

3 

90.4 

10 

n 

90.2 

9 
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Table  A 1 6.  Hit  Tables  of  BAYS  Applied  to  Variable  Set  I  for  Each 
Subsain  pie  Size  -  Base  Rate  Combination 


Validation 

Cross  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  500 

Predicted  1 

189 

94 

182 

110 

Base  Rate  -  507 

Classification  Accuracy  (7) 

Predicted  0 

61 

69.0 

156 

68 

64.4 

140 

Subsamnle  Size  1000 

Predicted  1 

346 

155 

333 

178 

Base  Rate  509? 

Classification  Accuracy  (5?) 

Predicted  0 

154 

69.1 

345 

167 

65.5 

322 

Subsample  Size  2000 

Predicted  1 

736 

386 

703 

390 

Base  Rate  509? 

Classification  Accuracy  (9?) 

Predicted  0 

264 

67.5 

614 

297 

65.6 

610 

Subsample  Size  500 

Predicted  1 

287 

96 

273 

103 

Base  Rate  65 7 

Classification  Accuracy  ( 9?  1 

Predicted  0 

38 

73.2 

79 

52 

69.0 

72 

Subsample  Size  1000 

Predicted  1 

589 

204 

558 

203 

Base  Rate  65% 

Classification  Accuracy  (7) 

Predicted  0 

61 

73.5 

146 

92 

70.5 

147 

Subsample  Size  -  2000 

Predicted  1 

1181 

437 

1186 

445 

Base  Rate  -  657* 

Classification  Accuracy  (9?) 

Predicted  0 

119 

72.2 

263 

114 

72.0 

255 

Subsample  Size  -  500 

Predicted  1 

450 

47 

448 

47 

Base  Rate  907 

Classification  Accuracy  (7) 

Predicted  0 

0 

90.6 

3 

1 

90.2 

3 

Subsample  Size  -  1000 

Predicted  1 

893 

87 

891 

92 

Base  Rate  -  909? 

Classification  Accuracy  (7) 

Predicted  0 

7 

90.6 

13 

9 

89.9 

8 

Subsample  Size  -  2000 

Predicted  1 

1796 

189 

1796 

196 

Base  Rate  -  907 

Classification  Accuracy  (7) 

Predicted  0 

4 

90.4 

1  1 

4 

90.0 

4 
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Tabic  A 1 7.  Hit  Tables  of  MAP  Applied  to  Variable  Set  II  for  Each 
Subsample  Size  -  Base  Rate  Combination 


Validation 

Cross  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  -  500 

Predicted  1 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

♦ 

Subsainple  Size  -  1000 

Predicted  1 

Base  Rate  -  50% 
Classification  Accuracy  (%) 

Predicted  0 

♦ 

Subsample  Size  -  2000 

Predicted  1 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

* 

Subsample  Size  -  500 

Predicted  1 

302 

109 

297 

132 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

23 

73.6 

66 

28 

68.0 

43 

Subsample  Size  -  1000 

Predicted  1 

605 

217 

591 

228 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

45 

73.8 

133 

59 

71.3 

122 

Subsample  Size  -  2000 

Predicted  1 

1161 

412 

1162 

398 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

139 

72.4 

288 

138 

73.2 

302 

Subsample  Size  -  500 

Predicted  1 

449 

47 

445 

49 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

1 

90.4 

3 

5 

89.2 

1 

Subsample  Size  -  1000 

Predicted  1 

900 

89 

898 

91 

Base  Rate  90% 

Classification  Accuracy  (%) 

Predicted  0 

0 

91.1 

1 1 

*> 

90.7 

9 

Subsample  Size  -  2000 

Predicted  1 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

♦ 

‘The  MAP  algorithm  did  not  converge 
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Table  AIS.  Hit  Tables  of  TRICOR  Applied  to  Variable  Set  II  for  Each 
Subsample  Size  -  Base  Rate  Combination 


Validation 

Cross  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  -  500 

Predicted  1 

165 

65 

166 

77 

Base  Rate  -  5091 

Classification  Accuracy  (%) 

Predicted  0 

85 

70.0 

185 

84 

67.8 

173 

Subsainple  Size  -  1000 

Predicted  1 

309 

129 

300 

135 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

191 

68.0 

371 

200 

66.5 

365 

Subsample  Size  -  2000 

Predicted  1 

813 

464 

775 

466 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

187 

67.4 

536 

225 

65.4 

534 

Subsample  Size  -  500 

Predicted  1 

295 

103 

289 

122 

Base  Rate  65% 

Classification  Accuracy  (%) 

Predicted  0 

30 

73.4 

72 

36 

68.4 

53 

Subsample  Size  -  1000 

Predicted  1 

603 

217 

594 

228 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

47 

73.6 

133 

56 

71.6 

122 

Subsample  Size  -  2000 

Predicted  1 

1162 

409 

1152 

396 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

138 

72.6 

291 

148 

72.8 

304 

Subsample  Size  -  500 

Predicted  1 

449 

46 

450 

48 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

1 

90.6 

4 

0 

90.4 

2 

Subsample  Size  -  1000 

Predicted  1 

897 

88 

895 

89 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

3 

90.9 

12 

5 

90.6 

11 

Subsample  Size  -  2000 

Predicted  1 

1791 

181 

1792 

177 

Base  Rate  -  90% 

Classification  Accuracy  (% ) 

Predicted  0 

9 

90.5 

19 

8 

90.8 

23 
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Table  A 19.  Hit  Tables  of  BAYS  Applied  to  Variable  Set  D  for  Each 
Subsample  Size  -  Base  Rate  Combination 


Validation 

Cross  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  -  500 

Predicted  1 

180 

80 

166 

101 

Base  Rate  -  50$ 

Classification  Accuracy  (%) 

Predicted  0 

70 

70.0 

170 

84 

63.0 

146 

Subsample  Size  -  1000 

Predicted  1 

334 

160 

325 

165 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

166 

67.4 

340 

175 

66.0 

335 

Subsample  Size  -  2000 

Predicted  1 

728 

382 

714 

362 

Base  Rate  -  50% 

Classification  Accuracy  (%  ) 

Predicted  0 

27° 

67.3 

618 

286 

66.1 

608 

Subsample  Size  -  500 

Predicted  l 

277 

76 

256 

104 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

48 

74.6 

66 

66 

65 .4 

71 

Subsample  Size  -  1000 

Predicted  1 

563 

204 

560 

217 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

57 

73.6 

146 

60 

72.3 

133 

Subsample  Size  2000 

Predicted  1 

1180 

417 

1158 

425 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

120 

73.2 

283 

142 

71  .6 

275 

Subsample  Size  500 

Predicted  1 

446 

44 

448 

47 

Base  Rate  -  60% 

Classification  Accuracy  (%) 

Predicted  0 

1 

6  |  (1 

6 

•y 

60.2 

3 

Subsample  Size  1000 

Predicted  1 

862 

81 

860 

83 

Base  Rate  60'* 

Classification  Accuracy  ) 

Predicted  0 

X 

"  1  1 

1 6 

to 

60.7 

17 

Subsample  Si/c  2000 

I're. luted  1 

t  76(> 

187 

1767 

186 

Base  Rate  60' : 

Classification  Acuiraev  (  1 

Pi-  !i,  led  o 

4 

‘iO  4 

13 

-» 

60.6 

14 

Table  A20.  Hit  Tables  of  MAP  Applied  to  Variable  Set  ID  for  Each 
Subsample  Size  -  Base  Rate  Combination 


Validation 

Cross  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  500 

Predicted  1 

Base  Rate  —  50% 
Classification  Accuracy  (%) 

Predicted  0 

* 

Subsample  Size  -  1000 

Predicted  1 

Base  Rate  -  50% 

Classification  Accuracy  ( 9? ) 

Predicted  0 

* 

Subsample  Size  -  2000 

Predicted  1 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

♦ 

Subsample  Size  -  500 

Predicted  1 

294 

97 

284 

108 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

31 

74.4 

78 

41 

70.2 

67 

Subsample  Size  -  1000 

Predicted  1 

600 

220 

605 

231 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

50 

73.0 

130 

45 

72.4 

119 

Subsample  Size  -  2000 

Predicted  1 

1273 

667 

1276 

668 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

27 

65.3 

33 

24 

65.4 

32 

Subsample  Size  -  500 

Predicted  1 

450 

43 

448 

44 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

0 

91.4 

7 

2 

90.8 

6 

Subsample  Size  -  1 000 

Predicted  1 

900 

86 

895 

84 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

0 

91.4 

14 

5 

91.1 

16 

Subsample  Size  -  2000 

Predicted  1 

1798 

182 

1796 

183 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

2 

90.8 

18 

4 

90.6 

17 

•The  MAP  algorithm  did  not  converge. 


27 


l 


Table  A21.  Hit  Tables  of  TRICOR  Applied  to  Variable  Set  III  for  Each 
Subsample  Size  -  Base  Rate  Combination 


Validation 

Cross  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  -  500 

Predicted  1 

191 

82 

185 

92 

Base  Rate  -  50% 
Classification  Accuracy  (%) 

Predicted  0 

59 

71.8 

168 

65 

68.6 

158 

Subsample  Size  -  1000 

Predicted  1 

317 

129 

318 

142 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

183 

68.8 

371 

182 

67.6 

358 

Subsample  Size  -  2000 

Predicted  1 

731 

350 

685 

349 

Base  Rate  50% 

Classification  Accuracy  (%) 

Predicted  0 

269 

69.0 

650 

315 

66.8 

651 

Subsample  Size  -  500 

Predicted  1 

303 

99 

292 

109 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

22 

75.8 

76 

33 

71.6 

66 

Subsample  Size  -  1000 

Predicted  1 

578 

195 

581 

203 

Base  Rate  -  65% 

Gassification  Accuracy  (%) 

Predicted  0 

72 

73.3 

155 

69 

72.8 

147 

Subsample  Size  -  2000 

Predicted  1 

1097 

331 

1073 

334 

Base  Rate  -  65% 

Gassification  Accuracy  (%) 

Predicted  0 

203 

73.3 

369 

227 

72.0 

366 

Subsample  Size  500 

Predicted  1 

446 

39 

446 

41 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

4 

91.4 

11 

4 

91 .0 

9 

Subsample  Size  1000 

Predicted  1 

900 

87 

897 

86 

Base  Rate  90% 

Classification  Accuracy  (%) 

Predicted  0 

0 

91.3 

13 

3 

91.1 

14 

Subsample  Size  -  2000 

Predicted  1 

1793 

181 

1 795 

177 

Base  Rate  ~  90% 

Gassification  Accuracy  (%) 

Predicted  0 

7 

90.6 

19 

5 

90.9 

23 
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Table  A22.  Hit  Tables  of  BAYS  Applied  to  Variable  Set  III  for  Each 
Subsample  Size  -  Base  Rate  Combination 


Validation 

Cross  Validation 

Actual 

Actual 

1 

0 

t 

0 

Subsample  Si/e  -  500 

Predicted  1 

198 

83 

181 

98 

Base  Rate  -  50'/? 

Classification  Accuracy  (%) 

Predicted  0 

52 

73.0 

167 

69 

66.6 

152 

Subsample  Size  -  1000 

Predicted  1 

374 

176 

367 

190 

Base  Rate  505? 

Classification  Accuracy  (9?) 

Predicted  0 

126 

69.8 

324 

133 

67.7 

310 

Subsample  Size  2000 

Predicted  1 

725 

347 

681 

353 

Base  Rate  505? 

Classification  Accuracy  (%) 

Predicted  0 

275 

68.9 

653 

319 

66.4 

647 

Subsample  Size  500 

Predicted  1 

291 

102 

291 

1 15 

Base  Rate  655? 

Classification  Accuracy  (5?  ) 

Predicted  0 

34 

r  -j 
be 

73 

34 

70.2 

60 

Subsample  Size  1000 

Predicted  1 

580 

189 

S68 

199 

Base  Rate  -  659? 

Classification  Accuracy  (9?) 

Predicted  0 

70 

74.1 

161 

82 

71.9 

151 

Subsample  Size  2000 

Predicted  1 

1166 

397 

1 156 

402 

Base  Rate  65'? 

Classification  Accuracy!'/?) 

Predicted  0 

134 

73 .4 

303 

144 

72.7 

298 

Subsample  Size  500 

Predicted  1 

444 

34 

439 

40 

Base  Rate  905? 

Classification  Accuracy  (5?) 

Predicted  0 

6 

92.0 

16 

1 1 

89.8 

10 

Subsample  Size  1000 

Predicted  1 

898 

86 

894 

86 

Base  Rate  909? 

Classification  Accuracy  (5?  ) 

Predicted  0 

2 

91.2 

14 

6 

90.8 

14 

Subsample  Size  2000 

Predicted  1 

1791 

178 

1795 

176 

Base  Rate  -  90'? 

Classification  Accuracy  ('/?) 

Predicted  0 

9 

90.6 

22 

5 

91 .0 

24 

29 
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Table  A23.  Hit  Tables  of  MAP  Applied  to  Variable  Set  IV  for  Each 
Subsample  Size  -  Base  Rate  Combination 


Validation 

Crou  Validation 

Actual 

Actual 

1 

0 

t 

0 

Subsampie  Size  -  500 

Predicted  I 

183 

87 

177 

85 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

67 

69.2 

163 

73 

68.4 

165 

Subsample  Size  -  1000 

Predicted  1 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

* 

Subsampie  Size  -  2000 

Predicted  1 

705 

376 

711 

370 

Base  Rate  -  50% 

Classification  Accuracy  (%  ) 

Predicted  0 

295 

66.4 

624 

289 

67.0 

630 

Subsample  Size  -  500 

Predicted  1 

274 

85 

271 

105 

Base  Rate  -  65% 

Classification  Accuracy  (%>) 

Predicted  0 

51 

72.8 

90 

54 

68.2 

70 

Subsample  Size  -  1000 

Predicted  1 

608 

220 

602 

241 

Base  Rate  -  6S% 

Classification  Accuracy  (%) 

Predicted  0 

42 

73.8 

130 

48 

71.1 

109 

Subsample  Size  -  2000 

Predicted  1 

1186 

447 

1190 

443 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

114 

72.0 

253 

110 

72.4 

257 

Subsample  Size  -  500 

Predicted  1 

449 

46 

449 

47 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

1 

90.6 

4 

1 

90.4 

3 

Subsample  Size  -  1000 

Predicted  I 

Base  Rate  90% 

Classification  Accuracy  (%) 

Predicted  0 

♦ 

Subsample  Size  2000 

Predicted  1 

1789 

175 

1786 

168 

Base  Rate  -  90%i 
Classification  Accuracy  (%>) 

Predicted  0 

11 

90.7 

25 

14 

90.9 

32 

•The  MAP  algorithm  did  not  converge. 


Table  A24.  Hit  Tables  of  TRICOR  Applied  to  Variable  Set  IV  for  Each 
Subsample  Size  -  Base  Rate  Combination 


Validation 

Cross  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  500 

Predicted  1 

171 

76 

152 

76 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

79 

69.0 

174 

98 

65.2 

174 

Subsample  Size  -  1000 

Predicted  1 

343 

164 

339 

170 

Base  Rate  50% 

Classification  Accuracy  (%) 

Predicted  0 

157 

67.9 

336 

161 

66.9 

330 

Subsample  Size  -  2000 

Predicted  1 

615 

274 

609 

287 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

385 

67.0 

726 

391 

66.1 

713 

Subsample  Size  500 

Predicted  1 

262 

77 

254 

93 

Base  Rate  -  65% 

Classification  Accuracy  (%  ) 

Predicted  0 

63 

72.0 

98 

71 

67.2 

82 

Subsample  Size  1000 

Predicted  1 

609 

224 

605 

242 

Base  Rate  65% 

Classification  Accuracy  (%) 

Predicted  0 

41 

73.5 

126 

45 

71.3 

108 

Subsample  Size  -  2000 

Predicted  1 

1171 

424 

1165 

418 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

129 

72.4 

276 

135 

72.4 

282 

Subsample  Size  500 

Predicted  1 

449 

46 

448 

45 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

1 

90.6 

4 

2 

90.6 

5 

Subsample  Size  1000 

Predicted  1 

899 

91 

897 

91 

Base  Rate  90% 

Classification  Accuracy  (%) 

Predicted  0 

1 

90.8 

9 

3 

90.6 

9 

Subsample  Size  2000 

Predicted  1 

1796 

183 

1794 

178 

Base  Rate  90% 

Classification  Accuracy  (%) 

Predicted  0 

4 

90.6 

17 

6 

90.8 

22 

.11 
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Table  A25.  Hit  Tables  of  BAYS  Applied  to  Variable  Set  IV  (or  V*)  for 
Each  Subsample  Size  -  Base  Rate  Combination 


Validation 

- 1  - P  “ 

Cross  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  -  500 

Predicted  1 

181 

79 

165 

88 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

69 

70.4 

171 

85 

65.4 

162 

Subsamnle  Size  -  1 000 

Predicted  1 

345 

165 

353 

188 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

155 

68.0 

335 

147 

66.5 

312 

Subsample  Size  -  2000 

Predicted  1 

694 

349 

674 

340 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

306 

67.2 

651 

326 

66.7 

660 

Subsample  Size  -  500 

Predicted  1 

281 

89 

280 

113 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

44 

73.4 

86 

45 

68.4 

62 

Subsample  Size  -  1000 

Predicted  1 

585 

199 

576 

204 

Base  Rate  -  65% 
•Classification  Accuracy  (%) 

Predicted  0 

65 

73.6 

151 

74 

72.2 

146 

Subsample  Size  -  2000 

Predicted  1 

1142 

374 

mi 

390 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

158 

73.4 

326 

189 

71.0 

310 

Subsample  Size  -  500 

Predicted  1 

449 

43 

447 

44 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

1 

91.2 

7 

3 

90.6 

6 

Subsample  Size  -  1000 

Predicted  1 

895 

83 

889 

85 

Base  Rate  -  90% 
Classification  Accuracy  (%) 

Predicted  0 

5 

91.2 

17 

1  1 

90.4 

15 

Subsample  Size  -  2000 

Predicted  1 

1792 

181 

1794 

177 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

8 

90.6 

19 

6 

90.8 

23 

•Categorizing  the  predictive  variables  in  Variable  Sets  IV  and  V  resulted  in  identical  sets  of  variables  for  BAYS 
analyses. 


Table  A26.  Hit  Tables  of  MAP  Applied  to  Variable  Set  V  for  Each 
Subsample  Size  -  Base  Rate  Combination 


Validation 

Cron  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  500 

Predicted  1 

190 

95 

177 

88 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

60 

69.0 

155 

73 

67.8 

162 

Subsample  Size  1000 

Predicted  1 

Base  Rate  50% 

Classification  Accuracy  (%) 

Predicted  0 

* 

Subsample  Size  2000 

Predicted  ! 

784 

438 

768 

449 

Base  Rate  50% 

Classification  Accuracy  (%) 

Predicted  0 

216 

67.3 

562 

232 

66.0 

551 

Subsample  Size  500 

Predicted  1 

273 

80 

269 

98 

Base  Rate  657 

Classification  Accuracy  (%) 

Predicted  0 

52 

73.6 

95 

56 

69.2 

77 

Subsample  Size  1000 

Predicted  1 

606 

224 

603 

244 

Base  Rate  -  65% 

Classification  Accuracy  (%) 

Predicted  0 

44 

73.2 

126 

47 

70.9 

106 

Subsample  Size  2000 

Predicted  1 

1178 

424 

1157 

431 

Base  Rate  65% 

Classification  Accuracy  (%) 

Predicted  0 

122 

72.7 

276 

143 

71 .3 

269 

Subsample  Size  500 

Predicted  1 

449 

48 

450 

48 

Base  Rate  90% 

Classification  Accuracy  (%) 

Predicted  0 

1 

90.2 

0 

90.4 

-> 

Subsample  Size  1000 

Predicted  1 

899 

89 

897 

88 

Base  Rate  90% 

Classification  Accuracy  (%) 

Predicted  0 

1 

910 

1  1 

3 

90.9 

12 

Subsample  Size  -  2000 

Predicted  1 

1796 

186 

1795 

182 

Base  Rate  -  90% 

Classification  Accuracy  (%) 

Predicted  0 

4 

90.5 

14 

5 

90.6 

18 

*The  MAP  algorithm  did  not  converge. 
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Table  A27.  Hit  Tables  of  TRICOR  Applied  to  Variable  Set  V  for  Each 
Subsample  Size  -  Base  Rate  Combination 


Validation 

Cross  Validation 

Actual 

Actual 

1 

0 

1 

0 

Subsample  Size  -  500 

Predicted  1 

171 

70 

156 

69 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

79 

70.2 

180 

94 

67.4 

181 

Subsample  Size  -  1000 

Predicted  1 

334 

153 

337 

161 

Base  Rate  -  50% 

Classification  Accuracy  (%) 

Predicted  0 

166 

68.1 

347 

163 

67.6 

339 

Subsample  Size  -  2000 

Predicted  1 

737 

397 

730 

391 

Base  Rate  -  50% 
Classification  Accuracy  (%) 

Predicted  0 

263 

67.0 

603 

270 

67.0 

609 

Subsample  Size  -  500 

Predicted  1 

270 

80 

267 

104 

Base  Rate  65% 

Classification  Accuracy  (%) 

Predicted  0 

55 

73.0 

95 

58 

67.6 

71 

Subsample  Size  -  1000 

Predicted  1 

603 

220 

602 

241 

Base  Rate  65% 

Classification  Accuracy  (%) 

Predicted  0 

47 

73.3 

130 

48 

71.1 

109 

Subsample  Size  2000 

Predicted  1 

1143 

395 

1135 

399 

Base  Rate  65% 

Classification  Accuracy  (%■) 

Predicted  0 

157 

72.4 

305 

165 

71.8 

301 

Subsample  Size  500 

Predicted  1 

448 

45 

448 

45 

Base  Rate  90% 

Classification  Accuracy  (%  ) 

Predicted  0 

-> 

90.6 

5 

2 

90.6 

5 

Subsample  Size  1000 

Predicted  1 

899 

90 

897 

89 

Base  Rate  90% 

Classification  Accuracy  (%) 

Predicted  0 

1 

90.9 

10 

3 

90.8 

11 

Subsample  Size  2000 

Predicted  1 

1785 

176 

1782 

171 

Base  Rate  90% 

Classification  Accuracy  (%) 

Predicted  0 

15 

90.4 

24 

18 

90.6 

29 
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